Improving AdaBoost for Classification on Small Training Sample Sets with Active Learning
نویسندگان
چکیده
Recently, AdaBoost has been widely used in many computer vision applications and has shown promising results. However, it is also observed that its classification performance is often poor when the size of the training sample set is small. In certain situations, there may be many unlabelled samples available and labelling them is costly and time-consuming. Thus it is desirable to pick a few good samples to be labelled. The key is how. In this paper, we integrate active learning with AdaBoost to attack this problem. The principle idea is to select the next unlabelled sample base on it being at the minimum distance from the optimal AdaBoost hyperplane derived from the current set of labelled samples. We prove via version space concept that this selection strategy yields the fastest expected learning rate. Experimental results on both artificial and standard databases demonstrate the effectiveness of our proposed method.
منابع مشابه
An anti-spam filter based on one-class IB method in small training sets
We present an approach to email filtering based on one-class Information Bottleneck (IB) method in small training sets. When themes of emails are changing continually, the available training set which is high-relevant to the current theme will be small. Hence, we further show how to estimate the learning algorithm and how to filter the spam in the small training sets. First, In order to preserv...
متن کاملUsing Validation Sets to Avoid Overfitting in AdaBoost
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. Half of the training set is removed ...
متن کاملThe Boosting Approach to Machine Learning An Overview
Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, this chapter overviews some of the recent work on boosting including analyses of AdaBoost’s training error and generalization error; boosting’s connection to game theory and linear programming; the relationship between boosting and logistic regression; extension...
متن کاملVipBoost: A More Accurate Boosting Algorithm
Boosting is a well-known method for improving the accuracy of many learning algorithms. In this paper, we propose a novel boosting algorithm, VipBoost (voting on boosting classifications from imputed learning sets), which first generates multiple incomplete datasets from the original dataset by randomly removing a small percentage of observed attribute values, then uses an imputer to fill in th...
متن کاملParameter Inference of Cost-Sensitive Boosting Algorithms
Several cost-sensitive boosting algorithms have been reported as effective methods in dealing with class imbalance problem. Misclassification costs, which reflect the different level of class identification importance, are integrated into the weight update formula of AdaBoost algorithm. Yet, it has been shown that the weight update parameter of AdaBoost is induced so as the training error can b...
متن کامل